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ABSTRACT 

We believe that a wide range of physical processes conspire to shape the ob- 
served galaxy population but we remain unsure of their detailed interactions. 
The semi- analytic model (SAM) of galaxy formation uses multi-dimensional 
paramcterisations of the physical processes of galaxy formation and provides 
a tool to constrain these underlying physical interactions. Because of the high 
dimensionality, the parametric problem of galaxy formation may be profitably 
tackled with a Baycsian-infcrence based approach, which allows one to con- 
strain theory with data in a statistically rigorous way. In this paper we de- 
velop a SAM in the framework of Bayesian inference. We show that, with 
a parallel implementation of an advanced Markov- Chain Monte- Carlo algo- 
rithm, it is now possible to rigorously sample the posterior distribution of the 
high-dimensional parameter space of typical SAMs. As an example, we char- 
acterise galaxy formation in the current ACDM cosmology using the stellar 
mass function of galaxies as an observational constraint. We find that the pos- 
terior probability distribution is both topologically complex and degenerate 
in some important model parameters, suggesting that thorough explorations 
of the parameter space are needed to understand the models. We also demon- 
strate that because of the model degeneracy, adopting a narrow prior strongly 
restricts the model. Therefore, the inferences based on SAMs are conditional 
to the model adopted. Using synthetic data to mimic systematic errors in the 
stellar mass function, we demonstrate that an accurate observational error 
model is essential to meaningful inference. 
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1 INTRODUCTION 



In our current paradigm of structure formation, the matter density of the Universe is dom- 
inate d by cold dark matt er (hereafter CDM), and galaxy formation is a two-stage process 



[e.g. 



White fc Rees 



19781 ). First, small perturbations in the density field, originating from 



quantum fluctuations in the early universe, grow and produce a population of virialised dark 
matter halos. Second, the baryonic matter associated with these halos accumulates at the 
halo centres owing to cooling and cold flows, forming stars and galaxies. Because of the 
hierarchical nature of structure formation in a CDM cosmogony, dark matter halos merge. 
The halo mergers eventually lead to galaxy-galaxy mergers, resulting in the formation of 
elliptical galaxies. 

The first stage of this process, the formation and virialisation of dark matt er halos, has 



been studied in great detail using the 



1974 



1972 



2001 



1997 



Bond et al!l991 



'exten ded) Press-Schechter formalism (e.g. 



Lacev fc Cole 



Fillmore fe Goldreich 



Lu et al 



Bullock et al. 



1984 : 



Press fc Schechter 



1993 ), spherical a nd ellipsoidal collapse (e 



Bertschinger 



1985 



20061) and n umerical simulatio ns (e.g 



2001 



Zhao et al. 



2003a b 



Avila-Reese et a 



Efstathiou et al 



Springe] 



2005 



1998 



1985 



Maccio et al.l 



g 



Gunn fc Gott 



Sheth et al. 



Navarro et al. 



2007 



Zhao et al. 



20091 ). These studies have yielded the mass function, spatial distribution, formation history, 
and internal structure of the CDM halo population and serve as the backbone for any study 
of galaxy formation. The knowledge of the second stage of galaxy formation is far less 
well established, mainly because the baryonic processes involved (cooling, star-formation 
and feedback) are poorly understood. Additional physical processes whose importance is 
not fully understood include dynamical friction, tidal stripping, black hole formation and 
accretion, and adiabatic contraction. 



Hydrodynamic simulations can now 



De uscci 



in a full cosmological context (e.g 



Oppenheimer fc Dave 



2006 



Katz 



Simha et al. 



19921 : 



to study galaxy 



"ormation and evo 



OO9I ) . However, computational power is still a 



Navarro fc White 



1993 



Keres et al. 



ution 



2005 



severe limitation at the present, and one has to compromise between simulation resolu- 
tion and box size. Because of this, an alternative approach, the semi-analytical model of 
galaxy formation, has been de yeloped and widely adopted to study the statistical properties 



of the galaxy population 



Somerville fc Kolatt 



1999 



(e.g. 



White fc Frenk 



1991 



Avila-Reese fc Firmani 



Kauffmann et al 



2000: 



Cole et al. 



1993 



2000: 



Mo et al 



1998 



Firmani fc Avila-Reese 



E-mail: luyu@stanford.cdu 



BIE-SAM 3 



200ft iKang et all 1200.4 ICroton et all 1200ft button fc van den Boschl 12009( 1 . In the semi- 
analytical model (hereafter SAM), one adopts "recipes" to describe and parametrise the 
underlying physical ingredients, such as star formation and feedback. The free parameters 
in the models are then tuned to reproduce certain observational data of the galaxy pop- 
ulation, such as stellar mass functions, colour-magnitude relations, metallicity-stellar mass 
relations, Tully-Fisher relation, and two-point statistics that describe the spatial distribution 
of galaxies (e.g. the two-point correlation function, the pairwise peculiar velocity dispersion, 
etc.). However, the theory of galaxy formation and evolution still faces several outstanding 



problems (see 



Primack 



20091 . for an up-to-date review). For example, i t remains a chal- 



lenge to fit the faint -end slope of the galaxy luminosity function (e.g. 



Benson fc Madau 



2003 



Mo et al. 



20051 ). and the models typically predict disk rotation velociti es that are 



too high, unless ad i abati c contraction and/or disk self-gravity are ignored (e.g. 



Cole et al. 



2000; 



Dutton et al. 



20071 ). In addition, the mo dels have problems match i ng the evolution 



of the galaxy mass functio n with redshift (e.g. 



De Lucia &: Blaizot 



2007 



Somerville et al. 



2008 



Fontanot et al 



(IBaldry et al. 



2006 



200 9J), and typically o yerpredict the fraction of red sate 



Weinmann et al 



2006 



Kimm et al 



2009 



Liu et al 



lite galaxies 



2010). There are 



three main reasons for these problems. First and foremost, current models most likely miss 
some vital ingredients or the recipes used do not properly implement the physical mechanism. 
Second, sub-space features and deg eneracies in the model parameter space have been either 



missed or not sufficiently explored (ILiu et al 



2010 



Neistein fc Weinmann 



2010). Third, the 



difficulties may actually reflect inconsistencies in the data themselves (so-called "systematic" 
errors). For example, it has been pointed out that the observed evolution in the stellar mass 



function is inconsist ent with the observed cosmic star formation history ( IFardal et al 



2007 



Primack et al. 



20081). 



To address these problems, one must quantitatively characterise the model constraints 
implied by existing data sets as well as explore a wider range of models. The SAM approach 
provides a promising avenue to tackle these problems owing to its flexibility in implemen- 
tation and its relatively fast speed in computation. However, significant changes in the 
methodology must be made to fully utilise the potential of SAMs. The main shortcoming 
in current SAM implementations is that they are not probabilistically rigorous. In many 
published SAM applications, a subset of model parameters are held fixed while other pa- 
rameters are adjusted to match some observational properties. If the match is unsatisfactory, 
one further adjusts some of the parameters or changes the model parametrisation until a 



4 

"good" fit is achieved. However, the goodness of fit is often assessed "by eye" ; one overlays 
the model prediction, a luminosity function for example, on the observed result to see if 
the prediction is sufficiently close to the data. Since the statistical uncertainties in both the 
data and the model are not consistently computed, confidence levels do not follow. Simi- 
larly, since the model parameters are explored by hand, marginal probability can not be 
computed. As mentioned earlier, a number of physical processes in galaxy formation are still 
poorly understood, and so the parameterisations of these processes have to be made very 
general. This leaves a large parameter space to be probed. Given the high dimensionality of 
the parameter space and the complex covariance between parameters, it is almost impos- 
sible to find and delineate the dominant mode by hand-tuning model parameters. Third, 
since the model parameters might be strongly covariant, the effect of changing one model 
parameter is conditional to the values of other parameters that are kept fixed. Therefore, 
switching on and off a process in a fiducial model is unlikely to determine its importance 
to galaxy formation. Indeed, to investigate the influence of a specific recipe, one should 
allow the parameters to range over their entire a priori plausible domain and marginalise 
over all the other parameters. Unfortunately, this kind of analysis has not been commonly 
adopted owing to the lack of suitable methods. Fourth, because many processes in galaxy 
formation are still poorly understood, different SAMs may adopt different parameterisations 
for the same process. While all these models can be tuned to match a limited set of obser- 
vational data and they are all considered as "plausible", whether one model is favoured by 
the data more than another needs to be assessed by the marginal likelihood, instead of by 
the goodness-of-fit of a single optimised parameter set. To compute the marginal likelihood 
or evidence, all the parameters should be allowed to vary in the domain specified by the 
priors, so that model selection can be made according to statistical evidence. Again, such 
an analysis is not included in the current SAMs. 

In summary, a variety of physical processes affecting galaxy formation are not yet well 
understood while copious observational data exist to constrain the models. Thus, to derive 
meaningful constraints from observations, we would like to know the probability of the 
various model parameters and, indeed, entire model families given the data. This leads 
us directly to Bayesian inference! The semi-analytical model provides a very powerful tool 
to translate the theory of galaxy formation into a set of model parameters. The Bayesian 
approach will then allow us to obtain the posterior distribution of the model parameters for 
a given set of data and to assess how a particular model is supported by the data. Moreover, 



given different model families, Ba yesian mode 
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comparison techniques such as Bayes Factors 



and Reversible Jump techniques (jGreenlll995l ) allow one to determine the relative odds for 



each model to reproduce the observed data. 

Some attempts have been made recently in this direction. For instance, 



(120101 ) have explored the parameter space of the GALFO RM model ( 



ing a model emulator based on Latin hypercube sampling (IMckay et al 



Bower et al 



Bower et al 



20061 ) us 



19791 ) . and identified 



a small fraction of the initial volume of the parameter space that is not ruled out by their us- 
ing the K- and 6j-band lu minosity functions of gal axies in the local Universe as constraints. 
Using the same technique, Benson Sz Bower ( 201ol ) have perfor med an exhaustive se a rch o f 
mod el parameter spa c e con strained by more observational data. iKampakoglou et al.l ( 120081 ) 



and 



Henriques et al 



(120091 ) have adopted the Markov-Chain Monte-Carlo (MCMC) tech- 
nique to explore the ability of their adopted SAMs to accommodate multiple observational 
data sets. 

In this paper, we develop a scheme to incorporate SAMs into the framework of Bayesian 
inference. To this end, we generalise the parameterisations for the model recipes so that 
our model can encompass the large uncertainties owing to our limited knowledge of galaxy 
formation. We also show that, aided with advanced MCMC techniques and moderate com- 
putational facilities, it is now possible to build a Bayesian inference-based SAM to efficiently 
explore the high dimensional parameter space involved and to establish the posterior distri- 
bution of model parameters reliably. 

The goal of the present paper is a description of our approach and a demonstration 
of the advantages of a Bayesian inference-based SAM in comparison with the conventional 
approach. In particular, we will show that the common practise of tuning some model pa- 
rameters while keeping others fixed may lead to an incorrect inference because of the use 
of unjustified, strong priors, and that our Bayesian inference-based SAM can overcome this 
problem. We will also demonstrate the sensitivity of the inference to the error model adopted 
for the data. The paper is organised as follows. In §21 we describe our generalised SAM and 
its relations to other models. A brief introduction to the principle of Bayesian inference and 
of the MCMC technique is presented in §3j In §H we show a case study using the stellar 
mass function of galaxies as the observational constraint. The impacts of prior assumptions 
and data modelling on the model inference are presented in §5] and [6], respectively. Finally, 
in §7J we discuss and summarise our main results. 
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2 SEMI- ANALYTIC MODEL 

As in all other SAMs, our model consists of two main parts, (i) the assembly of individual 
dark matter halos, and (ii) gas, radiative and star- format ion processes relevant to galaxy 
formation. We first prepare a large set of halo merger trees with the currently favoured 
cosmology and adopt it for all our subsequent semi-analytical modelling of the baryonic 
processes. Since the formation of dark matter halos is now relatively well understood, we 
focus on the baryonic physics in our Bayesian analysis. 



2.1 Halo merger history 

Halo merger trees can either be ex tracted from cosmological iV-body simulations (e.g. 



Kang et al. 



2005 



Croton et at- 



tended Press-Schechter for malism ( Lacey fc Col 



2000 



van den Bosc 



2006 ), or generated by a Monte-Carlo method using the ex 



1993 



Somerville fc Kolatt 



1999 



Cole et al. 



2002J). Merger trees from simulations provide the dynamics and envi- 



ronments of the halo population, but their construction is computationally expensive and 
limited by numerical resolution. On the other hand, Monte-Carlo merger trees are compu- 
tationally cheaper to generate an d have, in principal, infi nite resolution. In this paper, we 



adopt the algorithm proposed by 



Parkinson et al. 



(120081 ) to generate the merger trees for 



halos with a given final (z = 0) virial mass. This algorithm has been tuned to match the 
conditional mass functions found in iV-body simulations. More specifically, as a demonstra- 
tion we choose the control parameters Go = 1, 71 = 72 = 0, so that the resulting halo 



conditional mass functions are t 



mass function (IParkinson et al 



rose p redicted by the Extended Press-Schechter conditional 



20081 ). We sample a certain number of merger trees in each 



halo mass bin from 10 10 h~ 1 M Q to 10 15 /i _1 M , the mass range relevant to the modelling 
in this paper. Since the halos and their merger trees are randomly sampled from the halo 
mass function and the conditional mass function, model predictions based on a finite merger 
tree sample suffer from sampling variance. To reduce such sampling variance, we generate a 
sufficiently large number of halo merger trees in each mass bin so that the variance in model 
predictions induced by merger-tree sampling is much smaller than the error in the observa- 
tional data used to constrain the model and, hence, can be ignored. Specifically, we use 1000 
merger trees for halos with present masses in the range 10 11 - 10 12,5 /i _1 M , 1500 merger 
trees in the range 10 125 - 10 13 5 /i _1 M , 400 merger trees in the range 10 10 - 10 11 /i _1 M Q , 
and about 100 merger trees in the range 10 13 ' 5 - 10 15 /i _1 M Q . Since massive halos are rare 
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in the assumed cosmology, their contribution to the scatter of the stellar mass function is 
negligible. We vary the mass resolution of our merger trees with the final halo mass. For 
halos with final masses smaller than 10 12 /i _1 M Q , the mass resolution is 10 9,3 /i _1 M Q ; for 
halos with final masses larger than 10 14 /i _1 M , it is 10 11 /i _1 M ; and for intermediate mass 
halos, it is 10 10 /i _1 M . All the merger trees are sampled using 60 snapshots equally spaced 
in log(l + z) from z = 7 to z = 0. Throughout the paper, we use a ACDM cosmology with 
Qm = 0.2 6, r^R n = 0.044, h = 0.71, n = 0.96, and cr 8 = 0.79, which are consistent with 



WMAP5 flDunklev et al 



2009 



Komatsu et al. 



2009|). 



2.2 Radiative cooling 

Once the halo formation history is fixed, we model the radiative cooling of halo gas. As 



shown in 



Lu et al. 



(120101 ) . the predictions of often- used cooling models do not agree. Since 
these models do not incorporate uncertainties in their cooling prescriptions, the model choice 
imposes a strong prior on the SAM. To compare with published results, we use the cooling 



model of 



Croton et al. 



(120061 ) . We will study the effects of varying the cooling prescription 
in a future paper. In the Croton model, the halo hot gas is redistributed at every time-step, 
and the density profile of the hot gas is assumed to be a singular isothermal profile, 



Pgas 



m g as0 _ r -2 



47rr v 



where r vir is the virial radius of the halo. The total mass of hot halo gas mass is m gas o 



i 

cold 



m 



outJ ' 



where /b = ^b/^o is the universal baryon fraction, m*, m co id 



and m out are the masses in stars, cold gas and ejected gas, respectively, and the summation 
is over all galaxies in the halo. The temperature of the hot gas is constant for each halo with 



T 



T 



35.9( 



kms 



2 K where v v[r is the circular velocity of the halo at the virial radius. 



The cooling timescale of the gas at radius r is then estimated by 



T C ool(0 = 7T 



gas 



2 Pgas(^)-^-(-^gas) -^gas) 



(1) 



ecular weight in unit s of th e mass of hydrogen atom, and A is 



where [i is the mean mo 
the cooling function from ISutherland fc Dopital (jl993l ). At each time-step, we calculate the 
cooling radius r cool by equating the cooling timescale with the dynamical timescale, r coo i = 
Tdyn = fvirAw If the cooling radius is equal to or smaller than the virial radius, the cooling 
rate is defined as 

^cool^vir 



rn coo i = 0.5m h , 



ot" 



(2) 



on 1 n T> A O TV /TMD A o c\c\c\ '>•> '>•> 



In other words, half of the hot gas mass enclosed by the cooling radius cools and accretes 
onto the central object of the halo in a dynamical timescale. If the cooling radius is larger 
than the virial radius, we set the cooling rate equal to the total hot gas mass in the halo 
divided by the dynamical timescale. We implicitly assume that all hot gas is associated with 
the primary halo and that only the central galaxy can accrete cooling gas; that is, satellite 
subhalos contain no hot gas. 



In some recent S AMs, Active Galactic Nuclei (AGN) 



in massive halos (e.g. 



Croton et al 



2006 



Bower et al. 



' eedback reduces the ga s cooling 



2006 



Somerville et al. 



2008|). Equiva- 



lents, AGN feedb ack stops radiative coo ling in halos with masses larger than a characteristic 
mass (~ 1O 12 M ) (ICattaneo et al.ll2006l ). To include this effect, we introduce a characteristic 
halo mass for radiative cooling, Mcc, above which radiative cooling of the hot halo gas is 
assumed to be negligible. Since the exact value of Mcc is n °t known a priori, we treat it as 
a free parameter in a relatively large mass range, 10 1L5 - 10 14 ' 5 /i _1 M . 



2.3 Star formation 



We assume that the cooled-fraction of halo gas settles into the galaxy in an exponential disk 
with scale length r^isc- This gas form stars when the gas disk has a surface density higher 
than a certain threshold, Esf, mimicking the critical surface gas density for star form ation 



seen in disk galaxies (e.g. 



Kennicutt 



199 



Kennicutt et al. 



200 



7; 



Bigiel et al.ll2008l ). The 



fraction of cold gas above the threshold is given by the ratio of the radius r crit at which the 
cold gas density is S SF to the disk scale length: 

f«cold 



' crit Ad 



In 



27rr disc E SF : 



(3) 



where m cold is the total cold gas mass of the galaxy. Therefore, the cold gas mass enclosed 
by r crit is determined by the ratio r^ isr ,T,a,p/m rn M- Obser vationally, the threshold surface 



density is ~ 10 M pc 2 (e.g. 



Martin fc Kennicutt 



20011 ) . although the scale length may 



vary. Theoretically, the disk radius (the scale- length) is related to the virial radius and 
the spin parameter of its host halo: raise ~ ^ r vir (e.g. 



Mo et al 



19981 ) . In cosmological 



A^-body simulations, the spin parameter s, A, for dark matter halos follow a lo g-normal 



distribution with a median of ~ 0.05 (e.g. 



Warren et al. 



1992 



Cole fc Lacey 



1996|), but the 



distr ibution of A for t he baryonic componen t that forms galaxy disks is poorly understood 



(e.g. 



Bett et al. 



2010 



iNavarro fc Benz 



19911 ). In our SAM, we adopt the fiducial value Ao = 
0.05. This yields rdi SC ,o = 0.035r vir and £sf,o = 1 M pc~ 2 . We then parametrise the term 
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/ SF r^„ n SsKn. In the 



Croton et al. 



( 120061 ) model, r disc is set to be 3r disc0 , and 



r disc S SF - JSF'disc.O^SF.O 

S SF = 10 M pc" 2 , so that / SF = 90. 

Using on our parametrisation, the cold gas mass in the disk available for star formation 

is 

m C oid \ 27r/sF£ SF ,ordi SCi0 



™sf = ^cold 



In 



27r/s F E SF ,or| sc n / 



"2cold 



(4) 



We assume that the star formation rate is proportional to the cold gas mass within r cr i t and 



inversely proportional to the dynamical timescale of the disk, 



<<lls 



yielding 



'''disc 



(5) 



where e* is the star formation efficiency We assume that e* has a broken power-law depen- 
dence on the circular velocity of the host halo: 



«SF 



Vyir > V SF ; 



a S F 



( ^vir ^ 

\V S fJ 



PSF 



(6) 



SF, 



wher e a^p and /3r F ar e para meters. Early mode ls adopted a pure power-law until e* ~ 1 



[e.g. 



Kang et al. 



2005|). The 



Croton et al 



( 120061 ) model assumes /?sf = and sets «sf so 
th at 5-15% of the co ld gas is converted into stars in a disk dynamical time. The GALFORM 
of I Cole et al.l (|2000[ ) considers cases with /?sf = 0, 1.5 and 2.5. In our model, all four 
parameters, asF, /?sf, Vsf and /sf, are considered free parameters when modelling star 
formation in quiescent disks. It should be pointed out that our model is still based on a 
specific set of assumptions, even though it allows a large range of uncertainties in model 
parameters. There are other pres c riptions for star formation that are not included in our 



model (e.g. 



Somerville et al 



2008 



Krumholz et al. 



2009 



Fu et al 



2010|). 



2.4 Supernova feedback 

We assume that supernova (SN) feedback affects the interstellar medium (ISM) and hot 
halo gas in three ways: (i) the energy feedback from SN reheats a fraction of the disk ISM 
from the cold phase to the hot phase, and the reheated gas is mixed with the hot halo gas; 
(ii) a fraction or all of the heated gas is directly ejected from the host halo without mixing 
with the hot halo gas; and (iii) if the SN energy from all galaxies in a halo is sufficiently 
large, the hot gas in the host halo can be heated, causing a fraction of the halo hot gas to be 
ejected from the halo. No SAM has incorporated all of these mechanisms and the strength 
of the feedback is usually chosen without strong prior justification. For example, the Croton 



10 

model considered both m echanisms (i) and (ni) (ICroton et al.l 120061 ). while GALFORM 
incorporated (i) and (ii) (IBenson et al.l 120031 ). In these models, the total amount of SN 
feedback energy is assumed to be related to the star formation rate, and the feedback is 
assumed to be inst antaneous. The feedback strength is controlled by a fixed number (e.g. 



Croton et ajj 



20061 ) or assumed to have a power-law dependence on the circu 



of the host halo (e.g. 



Somerville fc Kolatt 



1999 



Cole et al 



2000 



Kang et al 



ar ve locity 



2005|). Our 



model incorporates all three mechanisms, and their relative strengths are free parameters. 
We assume that for every solar mass of stars formed, the energy released by supernovae is 
rj sn E sn , where ?7 sn is determined by the stellar initial mass function (IMF) and E sn = 10 51 erg. 
Our feedback model enforces energy conservation, so that the total energy to heat the gas 
cannot exceed the total energy released from supernovae. 

We write the SN energy released by a mass of Am* of star formation as 

En, = a SN ^Am*Vg 2 N (7) 

where Vsn = 630km/s and the free parameter «sn describes the uncertainties in the feedback 
energy and in t he IMF. For a Sca lo IMF (?7 sn = 5 x 10 -3 ) and with 20% of the SN energy 
in feedback (e.g. IfCang et al.ll2005l ). we find «sn = 0.25. We allow a SN to vary from 0.001 to 
10, encompassing the uncertainty of this parametrisation. To conserve energy, the total SN 
energy released by m* of star formation and available for feedback, E^,, should be equal to 
the sum of the energies used for the reheating, ejection and powering the wind. Thus, we 
can write 



E 



fb 



/ej) f A Am*v 



2 

vir 



1- - ■ .2 1 



-/ ej / rh Am*^ c + ^Am wind w e 2 sc , 



(8) 



2 \ jtj/jiu f-vir 1 2 J JJ 2 

where the coefficients, / r h and / C j, control the mass loading for the reheating and ejection, 
f esc is the circular velocity of the current host halo characterising its binding energy, and 
v v i T is the circular velocity of the host halo at the latest time when it was still a primary 
halo, characterising the binding energy of the galaxy. Note that v esc ^ v viT only for satellite 
galaxies. We further assume that the fraction for reheating, / r h, has a power-law dependence 
on the circular velocity of the halo, v v [ r . If the galaxy is a satellite, we use the circular velocity 
of its host halo when it first became a subhalo. So we have 
V \^ 



rb 



(9) 



where V$ is an arbitrary factor and is set to be 220km/s. The power-law has an upper limit 
given by energy conservation: 
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/rh 



(10) 



When an amount of /^Am* cold gas is reheated, we assume a fraction / e j escapes from the 
halo. For simplicity, we assume / e j has a power-law dependence on the circular velocity of 
the current host halo: 



/e 



Vn 



/?ej 



Again energy conservation sets an upper limit on f e y. 



/ej, 



cj.max 



rh .max 



ill 



n -1 



(11) 



(12) 



If there is still energy available after reheating and ejection, the surplus is assumed to power 
a wind, and the mass of the wind can be written as 



Am wind = e w Am, <^ a S N 



rii 



(13) 



We assume that a fraction of fm of the gas in the outflow, ejection and wind will come back 
to the halo as hot gas in a dynamical timescale, and we treat /rj as a free parameter. 

Thus, we model the SN feedback with 7 parameters: osn, «rh, Prk, «ej, /?ej> ew an d /rj. 
Because the wind dominates the outflow, we find that oej and /?ej are not constrained by 
the stellar mass function alone. Therefore, we fix «ej = and /3ej = in the present paper. 
Our model shares a number of common parameterisati ons with other mod els. For example, 

ywer et al. ( 201oi); if «ej and /3rh 



the reheating model is similar to the model studied in 
are set to be 0, our model is reduced to the Croton m odel ( 
be 0, our model is similar to the model described in 



Croton et al.ll2006h ; if ew is set to 



Somerville et al 



worth pointing out that other parameterisations are also possible (e.g. 



fl2008h. However, it is 



Benson et al. 



2003|). 



2.5 Galaxy mergers 

When two dark matter halos merge, we simply add the dark matter and hot gas of the 

smaller halo to the bigger one. The central galaxy of the more massive halo is then treated 

as the central galaxy of the new halo, and all other galaxies are considered as satellites. A 

satellite galaxy merges with the central galaxy in some fraction /df of the dynamical friction 

timescale. The dynamical friction timescale is parametrised as 
1 1 7r 2 - v ■ 

tblc lnAGM sat ' li4j 
where r vir and v v - n are the virial radius and circular velocity of the new host halo, M sat is the 

mass of the previous host halo of the satellite before it merges into the current halo, and In A 
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is the Coulomb logarithm, which is modelled as In A = ln(l + M v i r /M sat ) (e.g. ICroton et al 
20061 ). This formula assumes that the satellite galaxy is hosted by a subhalo with mass M sat 
and orbits in a central halo with a singular isothermal de nsity profile of circular velocity 
v v { r , starting at the virial radius (IBinney &: Tremaind 119871 ). Earlier SAMs adopted similar 



parameterisations, 
mass for M sa t (e.g 



jut used differen t prefactors. For example, some SAMs chose the galaxy 



Cole et al 



Croton et al 



2000|) and some others chose the subhalo mass for M sat (e.g. 



20061 ); this results in an order of magnitude difference in the prefactor. Other 



uncertainties include the value of the Coulomb logarithm, the effect of tidal stripping on 
orbital decay, and the initial velocity of the satellite. In our model, these uncertainties are 
absorbed into the prefactor, /df, a free parameter. 

The merging timescale is calculated when the host halo of the satellite merges into the 
host halo of the central galaxy. If the satellite was already a satellite before the merger, 
the dynamical fraction timescale for the satellite is recalculated based on the properties of 
the new host. When a satellite galaxy merges into the central galaxy, our treatment for the 
merger remnant depends on the mass ratio of the two galaxies, m sat /m ccntra i. Mergers are 
considered as major or minor depending on whether m sat /m cen trai is larger or smaller than a 
pre-selected /mg < 1- The values of /mg adopted in earlier SAMs are ~ 0.3. As the choice 
of this parameter is not constrained by the stellar mass function of galaxies, we simply take 
/mg = 0.3 instead of treating it as a free parameter. 

For a minor merger (m sat /m ccntra i < 0.3), the satellite's stars are added to the central 
bulge, and the satellite's gas is added to the central disk. A minor merger is assumed to 
trigger a star-burst in the disk, and all the stars formed in the burst are added to the disk 
component. For a major merger (m sat /m ccntra i > 0.3), we combine all the existing stars from 
the two merging galaxies into a central galaxy, which is now assumed to be an elliptical. 
Each major merger triggers a star-burst, and all stars formed in the burst are added into 
the central elliptical galaxy. A fraction eburst of the combined cold gas in the two merging 
progenitors becomes stars, and the rest joins the gaseous disk. We assume that et, U rst depends 
on the ratio of the baryon masses of the two galaxies: e^ irst = <^bm^t(m^t/m ceritr p ,] ) l3bu rat 



Similar models for galaxy mergers were adopted by 



Somerville et al. 



Croton et al. 



( 1200ll . |2008|) and 



(120061 ) although different authors used different values for the model parame- 
ters. In our model, the four parameters in the parametrisation, / DF , /mg, «burst and Eburst) 
are all treated as free parameters. As mentioned above, since /mg is n °t constrained by the 
stellar mass function considered in this paper, we simply fix its value to be 0.3. 



nm n r> A c t\ iTatt> a o nnn *?•? 



2.6 Calculation of a single model 
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The flowchart shown in Figure [T] summarises the calculation of the SAM described above. 
The code loads merger trees and other tables (e.g. cooling functions, stellar mass-to-light 
ratios for different star formation histories, and dust extinction) for subsequent calculations, 
and then reads the model parameters introduced above in this section, which are summarised 
in Table 1. We walk each merger tree from the top (the initial time) to bottom (the present 
time). At each tree level, a galaxy grows in the centre of a halo if the halo does not have 
any progenitor halos. If the halo is assembled through the mergers of progenitor halos, the 
central galaxy of the most massive progenitor is considered to be the central galaxy of the 
current halo, and all the other existing galaxies are considered to be satellites. 

At the initial time, we distribute hot gas in the dark matter halos and radiative cooling 
begins. We sub-divide each of the 60 time steps used to sample a merger tree into 5 finer 
time steps (equally spaced in t) to compute the cooling and to evolve the galaxies. In every 
time step, gas that is able to cool in the current time step is assigned to the central galaxy. 
For all galaxies in the halo, star formation continues until the cold gas surface density 
drops below the threshold value. When a satellite galaxy merges into a central galaxy, the 
recipes for the morphological transformation and merger-triggered starburst are applied. 
For both star formation modes, quiescent or bursts, we calculate the effects of SN feedback. 
We model chemic al evolution in the ISM using the "instantaneous recycling approximation" 



(jCole et al.l 120001 ): a fraction R of the newly formed stellar mass and a yield p of heavy 
elements are instantaneously returned to and uniformly mixed with the cold gas. Metals 
enrich the halo gas as the reheated gas mixes with the hot halo gas (assuming a one-zone 
model, see Subsection 12 .4[) and affect the cooling rate. Both R and p depend on the IMF. 
However, since we have adopted a simplified model for gas cooling (see Subsection 12. 2j) and 
since the stellar mass function we are concerned with here is affected by metallicity only 
through gas cooling, in this paper we simply fix R = 0.3 and p = 0.03 instead of treating 
them as free parameters. Our code uses the Stel lar Population Sy n thesis (SPS) model of 



Bruzual fc Chariot 
to galaxies. 



(120031 ) and the dust model of iKauffmann et al.l (119991 ) to assign fluxes 



The evolution continues until the root of the merger tree is reached. At this point, we 
have a realisation of the model specified by the set of parameters. The results obtained 
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from these realisations can then be used to compare with observational data to evaluate the 
likelihood of the data given the model. 



3 BAYESIAN MODEL INFERENCE AND THE MCMC METHOD 
3.1 Bayesian inference 

Bayes Theorem states that the posterior probability of a set of parameters in a model (or 
hypothesis) H for given data D is 

P(0|D, H) oc P(@\H)L(D\@, H), (15) 

where P(®\H) is the prior probability distribution, which describes any knowledge acquired 
about the parameters before seeing the data, and L(D|0, H) is the likelihood of the data 
D for the given model parameter set 0. As mentioned earlier, for SAMs, we have limited 
prior knowledge about the model parameters. Therefore, we choose either uniform or 1/x 
distributions between two physically chosen bounds for the prior distributions, depending 
on the particular parameter in question. As a test, the priors for some of the parameters 
are made strongly restrictive to demonstrate the sensitivity to these choices. Our assumed 
priors for the standard model (Case 0) are summarised in Tabled] as the first listed for each 
parameter. Note that the prior adopted for cksn allows the model to use more energy to 
power the feedback than the total SN energy assumed to be available. With such a prior, we 
test whether the model could explain the data if the SN energy is somehow underestimated. 
The problem-specific definition of the likelihood function is described in later sections. 



3.2 The Markov-Chain Monte-Carlo algorithm 

Since it is not possible to integrate the posterior probability distribution function analyt- 
ically for our SAM, we use a Monte-Carlo sampling approach to elucidate the posterior 
distri bution. We adopt a newly developed software package, the Bayesian Inference Engine 
(BIE, Weinberg 2010al lbl). which includes a suite of advanced MCMC algorithms and sup- 



ports parallel computation. A detailed description of the package is beyond the scope of the 
present paper and can be found in the two references cited above. Here we present a brief 
description. 

As we will show later, the topological structure of the posterior probability distribution 
in our problem is high- dimensional and very complex. Not only does the posterior show 
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# 


Parameter 


Meaning 


Prior 


Posterior 


1 


logM cc (M s ) 


cooling cut-off halo mass 


[1.5, 4.5] 
[1.5 , 4.5] 
[1.5, 4.5] 


[2.19, 2.67] [3.09, 4.47] 
[2.13, 2.49] [3.15, 4.47] 
[2.07, 2.49] 


2 


log OSF 


star formation efficiency power-law amplitude 


[-3, 0] 
[-3, 0] 
[-3, 0] 


[-2.19, -0.03] 
[-2.97, -2.85] [-1.47, -0.03] 
[-2.97, -2.49] 


3 




star formation efficiency power-law index 


[-1, 12] 
[-0.2, 0.2] 
[-0.2, 0.2] 


[-0.87, 0.43] [3.29, 11.87] 
[-0.2, 0.2] 
[-0.2, 0.2] 


1 


logVsF (km/s) 


star formation law turn-over halo circular velocity 


[1.5, 3.0] 
[2.1, 2.3] 
[2.1, 2.3] 


[1.52, 2.39] 
[2.1, 2.3] 
[2.1, 2.3] 


5 


log/sF(M Q /pc 2 ) 


star formation threshold gas surface density 


[-1, 3] 
[1.8, 2.2] 
[1.8, 2.2] 


[-0.96, -0.64] [-0.24, 2.16] 
[1.8, 2.2] 
[1.8, 2.2] 


6 


log asN 


SN feedback energy fraction 


[-3, 1] 
[-3, 1] 
[-3, 1] 


[-2.35, 0.85] 
[-1.35, -1.15] [-0.25, 1.00] 
[-1.75, 0.25] 


7 


log Qrh 


SN feedback reheating power-law amplitude 


[-3, 2] 
[-3, 2] 
[-3, 2] 


[-2.55, 1.95] 
[-2.65, -0.75] 
[0.260, 1.22] [-0.75, 1.95] 


8 


/3rh 


SN feedback reheating power-law index 


[0, 14] 
[0, 14] 
[1.8, 2.2] 


[0.14, 13.86] 
[5.46, 11.62] 
[1.8, 2.2] 


9 


log e w 


fraction of surplus SN feedback energy used for powering wind 


[-3, 0] 
[-3, 0] 
[-3, 0] 


[-2.97, -0.15] 
[-2.97, -0.81] 
[-2.97, -0.21] 


10 


log /hi 


fraction of re-infall ejected hot gas 


[-2, 0] 
[-2, 0] 
[-2, 0] 


[-1.98, -0.02] 
[-1.97, -0.03] 
[-1.94, -0.02] 


11 


log /df 


merging time-scale in dynamical friction time-scale 


[-1, 2] 
[-1, 2] 
[-1, 2] 


[0.53, 1.97] 
[0.23, 0.59] [0.77, 1.97] 
[0.05, 0.65] 


12 


logasB 


merger triggered star burst efficiency power-law amplitude 


[-2, 0] 
[-2, 0] 
[-2, 0] 


[-1.98, -0.02] 
[-1.97, -0.09] 
[-1.97, -0.15] 


13 




merger triggered star burst efficiency power-law index 


[0, 2] 
[0, 2] 
[0, 2] 


[0.02, 1.98] 
[0.02, 1.98] 
[0.02, 1.98] 


14 


c*ej (fixed) 


SN feedback cold gas ejection power-law amplitude 


0.0 


0.0 


15 


/3 E J (fixed) 


SN feedback cold gas ejection power-law index 


0.0 


0.0 


16 


/mg (fixed) 


major merger minor merger threshold 


0.3 


0.3 



Table 1. Summary of the model parameters. Column 2: the parameter; Column 3: the description of the parameter; Column 
4: the prior distribution; Column 5: the 95% confidence bound of the posterior distribution. For the prior and posterior 
distributions, the three rows for each parameter are for Case 0, Case 1 and Case 2, respectively. Parameter 1 to 13 are set free, 
whereas parameter 14, 15 and 16 arc fixed. 
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multi-modality and strong degeneracies among the model parameters, but also the high- 
probability regions only occupy a v ery small fraction of the entire parameter space along a 



curving, very thin manifold (also see 



Bower et al. 



20101 ) . Because of this, it is technically chal- 



lenging to sample the posterior efficiently using the standard Metropolis-Hastings MCMC 



algorithm. To overcome this proble m, we ado 



3t diff erential evolution algorithm as the main 



algorithm for our MCMC sampler (ITer Braakll2006l ). For every single chain at each step, the 



differential evolution algorithm randomly selects two other chains and uses a fraction of the 
vector connecting the current states of the two chains as a proposal. This strategy improves 
proposal efficiency and mixing by automatically "tuning" the proposals to the ensemble 
of states comprising the individual chains. For a multi-dimensional Gaussian posterior, the 
optimised fractio n of the vector i s 70 = 2.38/V / iVdim, where iVdim is the dimension of the 
parameter space ( ITer Braakll2006l ). Since our posterior is expected to deviate significantly 
from a Gaussian, we use 7 = O.I70 to maintain a good acceptance rate (~ 10%). After every 
10 steps, we use the full vector as the proposal by temporally setting 7 = 1 to allow the 
chains to swap modes. As the simulation proceeds, the chains gradually settle into the high 
probability regions and the distribution can guide the chains to move along the ridges of the 
posterior or to jump between different modes. Moreover, all the converged chains sample 
the posterior, further enhancing the overall efficiency. 

To enhance mixing and to e xplore the parameter space more efficiently, we combine a 
tempered simulation algorithm (INeall 119961 ) with differential evolution. In short, tempered 
simulation proposes exchanges between the posterior distribution of Pq and a "powered- 

1 IT- 

up" distribution Pj oc P with T < T max . Each step begins by "melting", T i+ i > T 
followed by "freezing", T + \ < Tj. We perform one tempered step for every 21 standard 
steps, with the maximum temperature T max selected to be similar to the difference in the 
logarithmic posterior probability between a high-probability region and a low-probability 
valley: T max « lnP max /P min . In the temperature range from 1 to T max , we set M temperature 



levels equally spaced logarithmically. The default value of M is set to be V A^im + 3 In T max . 

For our problem, Adi m = 13 and we set T max = 64, so that M = 16. At each temperature 

1/2 

level Ti, 10 differential evolution steps are taken, with 7 stretched by a factor of T t . In 
total, it takes 320 differential evolution steps for a chain to go through the "melting" and 
"freezing" procedure for a single tempered step. As the parameter T max controls the efficiency 
with which the MCMC chains explore parameter space, we have carried out tests by varying 
the maximum temperature. The tests show that the T max we choose is sufficiently high: the 
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posterior does not show any new features as the temperature is increased by a factor of 
4. Using a suite of tests, we also find that the tempered steps with the T max specified as 
above substantially improves the efficiency of exploring state space, speeding up convergence, 
although employing such steps significantly increases the computational load. 



3.3 A Bayesian-inference based SAM 

We outline the structure of our Bayesian-inference based SAM in Figure |2j The MCMC 
algorithm provides proposal parameter vectors for the SAM, and the SAM predicts the 
galaxy population using the proposed parameter set. The likelihood is evaluated by com- 
paring the model prediction with the data, and is returned to the MCMC. The MCMC 
algorithm accepts or rejects the proposal based on the posterior probability and generates a 
new proposal for the SAM. We use our tempered differential evolution algorithm described 
above with 128 chains running in parallel. The MCMC-SAM loop continues until conver- 
gence is achieved. The converge nce of the chains is monitored by the Gelman-Rubin R 
statistic ( iGelman &: Rubinlll992l ). In essence, R is the ratio of the variance between chains 
to the variance within chains. We declare convergence when R < 1.2. When the chains are 
converged, we use post-convergence states (typically about 10 6 ) to study and characterise 
the posterior distribution. The converged states sample the full probability distribution of 
the model parameters given the observational data, and can be used to estimate confidence 
regions for individual model-data comparisons through marginalisation and to determine 
the relative posterior odds for different models. In the following sections, we use a simple 
example to demonstrate the power of our Bayesian-inference based SAM. 



4 THE SAM POSTERIOR: STELLAR MASS FUNCTION CONSTRAINED 

In this paper, we consider constraints on our SAM provided by the stellar mass function of 
galaxies, a fundamental property of the galaxy population that has been extensively used 
for model-data comparison. We choose the stellar mass function instead of the luminosity 
function simply because the stellar mass of galaxies is a direct prediction of our SAM, 
and hence we avoid problems associated with any uncertainties in the stellar population 
synthesis or dust models in our predictions. However, these same uncertainties are present 
in the reduced data, since a stellar population synthesis model was used to convert the 
observed galaxy luminosities into stellar masses. These uncertainties should in principle be 



properly included in the error budget of the observational data. Since the purpose of this 
section is purely to provide a concrete demonstration of our method and to illustrate the 
complexities inherent in the posterior distribution function, we adopt an ad hoc model for 
the errors. In £j6l we explore the impact of the error model on the Bayesian inference by 
changing our assumptions about the errors. 

We study the constraints on the 13 free pa r amete rs characterising our SAM (see Table 



1) using the stellar mass function of iBell et al.l ( 120031 ). Assuming that stellar mass bins are 



mutually independent, the likelihood function is 



L($ ohs \9) = L e W LY. [ ® l '° h \t l ' mOdm2 ) , (16) 



2°lobs 



where L is an arbitrary normalisation factor, $j i0 b s is the value of the observed stellar 
mass function in the ith bin, $j imo d is the corresponding value predicted by the model with 
the given parameter set 9, and of u a is t he variance of the observed stellar mass function. 



The error estimation of 



Bell et al. 



(120031 ) only takes into account the sampling error, but 
we expect significant bias (systematic uncertainty) from the assumptions made in the data 
reduction. To mimic the effect of large systematic uncertainty, we artificially inflate the 
statistical error bars by a factor of 3. Please note, we are not advocating this procedure, 
rather, we argue this is a very bad thing to do in general for at least two reasons: (i) this tends 
to imply greater support for a model than is truly admitted by the data, and conversely, 
tends to reduce the ability of the data to choose between competing hypotheses; and (ii) 
inflated error may hide serious problems with the data or inconsistencies with other data. 
Strictly speaking, the Bayesian approach applies equally well to systematic uncertainty as 
to sampling error. Mathematically, let systematic uncertainties be described by a parameter 
vector 7]. The likelihood now depends on rj through $i i0 bs( ? ?)- We simply define a prior 
distribution for the uncertainty P(rj) by expert opinion or through an ancillary calibration. 
The inference continues as before, now with the augmented parameter vector © = (6,rj). 
In the end, we simply marginalise over rj. For our problem specifically, we are aware our 
error inflation produce is ad hoc and does not correctly represent the bin-to-bin covariance 
in $i )0 bs induced by the stellar mass function. We will discuss how such covariance affects 
our results in §6j We will perform a luminosity function-based inference using a population 
synthesis model and an appropriately chosen prior uncertainty in a future paper. However, 
the lack of a stellar mass function with a suitably described error model forces us to make a 
crude error model approximation for the point of illustration in this section. In addition, our 
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Monte-Carlo evaluation of $j, m od has variance owing to the finite sampling of the assembly 
histories of dark matter halos. This model dispersion should be included in the likelihood. 
However, it is typically 3 times smaller than the inflated error bars in the data and, therefore, 
not explicitly included in equation (JTBjl . 

Our model also has other uncertainties. For e xample, the coo ling rate given by different 



prescriptions can vary by a factor of a few (see iLu et al.ll2010l ). Such uncert ainties could 



be in cluded in the likelihood evaluation if they were properly understood (e.g. 



Bower et al. 



2010l ). Alternatively, one may include model uncertainties as part of the model by using an 



extended model family. In this paper, we restrict our demonstration to a fixed model family 
and ignore any uncertainties other than those represented by the priors. In a future paper 
we will demonstrate how to extend the analysis to multiple model families using Bayesian 
model selection. 

We use our Tempered-Differential Evolution algorithm to run the MCMC simulation with 
128 chains in parallel. The initial states of the chains are randomly distributed in parameter 
space according to the prior probability distribution. We terminate the simulation after 
16,000 iterations, when a sufficiently large number of states are collected to summarise the 
marginalised posterior. The Gelman- Rubin statistic monitors the convergence of the MCMC 
simulation, and it identifies 123 chains that are well mixed after the simulation terminates. 
In Figure [3j we plot the trajectories of 3 chains randomly selected from the mixed chains and 
compare them with a trajectory of a outlier chain. One sees that the chains were all widely 
dispersed at the beginning. The mixed chains gradually converge to a high probability mode 
after about 3000 iterations. In contrast, the outlier chain does not converge, but wanders 
around in low probability regions. The simulations are kept running for 16,000 iterations, 
even though most of the chains have "burned-in" after 4000 iterations. For the analysis 
presented in this section, we take the consecutive 12,000 steps of the 123 converged chains, 
about 1.5 million states, to summarise the marginalised posterior probability distributions 
of the model parameters. 



4.1 Physical implications 

Figure H] shows the one- and two-dimensional marginalised posterior probability distributions 
of the 13 free parameters. Three of these parameters, /rj, osb and /3sb; are unconstrained 
by the stellar mass function and not shown. In the upper-right corner of the figure, we plot 
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the predicted stellar mass function by marginalising over the 95% confidence range of the 
posterior. Clearly, the stellar mass function is well reproduced by the model. Tabled] lists the 
95% confidence bounds of all the parameters (as the first range listed for each parameter). 

The strength of the constraints varies widely. Some parameters are weakly constrained: 
for example, e\y, the efficiency of SN feedback powering the galactic wind, is very weakly 
constrained. In contrast, some parameters are tightly constrained: for example, Vsf is con- 
strained to a narrow range (around ~ 160km/s), so are /3gp (around 6) and /3rh (around 8). 
Our inferred values of /?sf and /3rh are much higher than those adopted in previous SAMs. 
The posterior indicates a sharply suppressed star formation efficiency in halos with circular 
velocities below ~ 160km/s. In addition, the posterior distribution in the /3sf - Am plane 
reveals bimodality: either the star formation efficiency or the SN reheating efficiency is a 
steep power-law of halo circular velocity. In other words, the shallow slope of the stellar 
mass function at the low-mass end requires the suppression of star formation in small halos. 
Since the star formation efficiency directly controls the conversion of cold gas into stellar 
mass, the high /?sf mode dominates the high /3rh mode. We are unsure whether or not such 
high values of /?sf and /3rh are physically plausible. It is likely that some new physics in 
addition to SN feedback is required to suppress star formation in low-mass halos, as we will 
demonstrate in detail in a forthcoming paper. 

Some model parameters are strongly correlated. These include the following pairs of 
parameters: /sf~«sf; «rh~/3rh; and Mcc _ /df- Both «sf and /sf control the conversion of 
cold gas into stars and the degeneracy is expected. Similarly, the two parameters in the 
power-law parametrisation of the SN reheating, «rh and /3rh, are degenerate. And again, 
the parameters controlling the two mechanisms responsible for the formation of central 
galaxies in massive halos, Mqc and /df are correlated; massive central galaxies can either 
acquire their mass through gas cooling and in situ star formation, or through the accretion of 
satellite galaxies. The observed sharp decline of the stellar mass function at the high-mass 
end requires either that gas cooling in halos more massive than ~ 10 12 M is effectively 
quenched (e.g. by AGN feedback) or that the merger rate of satellite galaxies into the 
central galaxy by dynamical friction is slow. 

Comparing our results with those of previous studies, we notice that so me of the modes 



we id entified are broadly consistent with those found by othe r stud ies. iHenriques et al. 



(120091 ) found that edi SC in the model proposed by 



Croton et al. 



to «rh in our model, is required to be as high as about 10. iBower et al.l (120101 ) found 



(120061). whic h corre sponds 



BIE-SAM 21 

that the normalisation for the star formation efficiency is as low as about 0.003, which is 
also similar to the mode we find for agp. Nevertheless, the features shown in the posterior 
distributions in these studies and ours do not generally agree because of different definitions 
of the parameters. 



4.2 Structure of the posterior distribution 

The two-dimensional posterior distributions shown in Figure H] are marginalised over 11 
dimensions and wash out much of the intrinsic sub-dimensional structure that complicates 
the inference and renders tweaking by hand unreliable. To demonstrate this, Figure [5] shows 
a 3-dimensional cut through the 13-dimensional likelihood function. The cut is made by 
computing the likelihood function on a fine grid of «sf, «rh and /sf and fixing the other 
10 parameters to the values where the likelihood function has its global maximum. In the 
plotted volume, the maximum logarithmic likelihood is log(L) = —3.41, which is enclosed by 
the inner surface (blue) denoting log(L) = —4. The outer surface (red) has log(L) = —9.9. 
If the likelihood function were Gaussian, the outer surface would be approximately at the 
"1-ct" level. The contour lines on the planes are linearly spaced in logarithm with a spacing of 
13.4. The figure shows that the parameters are strongly correlated in the plotted range and 
that the likelihood function changes dramatically with /sf- Within a small range of log/sF, 
from 1.8 to 2.0, the iso-likelihood surface moves a long distance in the asF-«RH plane and 
becomes elongated when log/gp is close to 2.0. This type of complex structure and dramatic 
change happens also in the other dimensions. As a consequence, the posterior is significantly 
more complex than one might expect by only looking at the fully marginalised distributions 
presented in Figure HI The fine structure and complex topology of the posterior also make it 
clear that it is extremely difficult to find the best fit by tuning model parameters by hand. 
It also explains why it is extremely computationally challenging to properly sample the 
posterior. The MCMC algorithm must navigate along extremely thin and curved surfaces in 
multiple dimensions, often with very small gradients in likelihood to find the most probable 
models. For example the results presented here required more than a million SAM evaluations 
with approximately 5 x 10 4 2GHz Opteron CPU hours. 
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5 THE IMPACT OF PRIOR CHOICE 



In this section, we study the affect of the prior distribution on the final inference by selectively 
applying narrow prior distributions for some of the parameters. This mimics the common 
practise of fixing some model parameters. In Case 1, three of the 13 parameters are given 
narrow priors. The value of /?sf is limited to the narrow range [—0.2, 0.2], to mimic a fla t 



power-law for star formation efficiency as adopted in some SAMs (e.g. ICroton et al.ll2006l ). 
The parameter Vsf has little effect so we set log(VsF/kms _1 ) in the narrow range [2.1,2.3]. 
Furthermore, we assign a narrow prior, [1.9,2.1], to log/sF, corresponding to the choice 
E s f ~ 10M Q /pc 2 and r disc rs 3r disc0 , which is often used in previous SAMs (e.g. 



Croton et al. 



20061 ). All the other prior distributions are the same as in the fiducial case considered §|4] 
(Case 0). The resulting marginalised posterior distributions in Figure El and summarised in 
Table [1] as the second prior and posterior 95% confidence range listed for each parameter, 
show that the distribution of all the parameters becomes more compact. The improvement of 
prior information on some parameters not only tightens the constraints on those parameters 
themselves but can also help break degeneracies in other dimensions. For example, since the 
star formation law is restricted to have weak dependence on halo mass, the efficiency of SN 
reheating is forced to be a steep power-law of halo circular velocity. For similar reasons, the 
degeneracies of the other parameters with /3rh are all also reduced. Note that this restrictive 
prior is not located near the dominant posterior mode with unrestricted priors (cf. Fig. [4]). 
Moreover, the bulk of the Case 1 posterior has very low probability in the Case posterior. 
Nevertheless, the quality of the fit does not change much, as one can see from the reproduced 
stellar mass function shown in the upper- right panel of Figure EJ This illustrates the danger 
in fixing the values of parameters to plausible values especially when there is no compelling 
prior reason for imposing such a strong constraint. 

Case 1 requires that the SN feedback be a very steep function of the halo circular velocity 



when the star formation efficiency is 



Kauffmann et al 



1999 
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breed to change slowly with halo mass. Early SAMs (e.g. 



20051 ) assumed a weak dependence of the SN feedback 
on halo mass (/3rh ~ 2) and concluded that the number of faint galaxies are over-predicted 
if /?sf ~ 0. However, whether or not a good fit can still be obtained by changing the 
other parameters while keeping /?sf ~ and /3rh ~ 2 requires a full exploration of the high- 
dimensional parameter space. Case 2 addresses this question by imposing the additional prior 
restriction, /3rh £ [1-8, 2.2], and in Figure[7]we show the resulting marginalised distributions, 
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which are summarised in the third entry for each parameter in Table [U The modes have 
moved substantially with respect to those in Case 1. To compensate for the weakened SN 
reheating in small halos owing to the assumed weak dependence of SN reheating on halo 
circular velocity, the model employs a much lower efficiency for star formation and a larger 
reheating amplitude; the mode moves from the lower-right to the upper-left in the log «sf — 
logaRH plane. For similar reasons, the posterior mode in the other dimensions also change. 

The posterior-weighted stellar mass function shown in the upper-right panel of Figure [7] 
still reproduces the observed stellar mass function even though the power indices /3sf and 
/3rh are both fixed to low values. This illustrates the importance of carefully specifying the 
prior distribution for each parameter, especially when a parameter is weakly constrained, 
and the necessity for fully characterising the posterior distribution over its full domain. 

In summary, the results obtained in this section show that assigning restrictive prior 
distributions to some parameters can significantly reduce the volume of the parameter space 
and tighten the constraints on all the parameters in SAMs of galaxy formation. Thus, any 
prior knowledge, either from observation or physical consideration, can help the model infer- 
ence and hence improve our understanding of galaxy formation. However, this also indicates 
that full posterior distribution can be very different from the posterior distribution with 
restricted priors so that adopting unsubstantiated priors to fix parameters can lead to an 
erroneous inference for all the other parameters. Also, it would not be straightforward to 
estimate the potential effects of introducing additional processes when the existing model 
parameters are held fixed. Because of the covariance between model parameters, adding a 
new model parameter may change the posterior significantly. 

6 IMPACT OF THE ERROR MODEL 

The observational error model directly influences the value of likelihood and the shape of 
the cost function in parameter space. However, the impact of the error model has not been 
carefully considered in SAMs. In this section, we explore the effect of incorrect error estimates 
on the resulting inference. 

Astronomers traditionally differentiate two kinds of errors, statistical errors and sys- 
tematic errors. Statistical errors result from well-understood processes with known parent 
distributions (e.g. sampling error) while systematic errors come from the underlying as- 
sumptions made for the measurements. From the Bayesian context, a systematic error is 
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the result of poor prior information and often results in bias. For the stellar mass function 
considered here, the total error budget consists of the independent statistical errors of indi- 
vidual stellar mass bins owing to the finite number of galaxies used in estimating the stellar 
mass function, and systematic errors, which arise from uncertainties in the stellar popula- 
tion synthetic models used to estimate the stellar mass from the observed luminosity. These 
systematic uncertainties correlate the bins. For example, the uncertainty in the assumed 
IMF will increase or decrease the stellar masses of all the galaxies in the same sense. 

When errors in different mass bins are correlated, the likelihood function may be ap- 
proximated as follows: 



where <£ b s and "3? mo d are the vectors of the observed and predicted stellar mass functions over 
the stellar mass bins, S is the covariance matrix that describes the correlated error model, 
and / is the rank of the covariance matrix. For independent errors, all the off-diagonal terms 
vanish and the likelihood reduces to Eq.( fi~6]) . 

To test the effect of correlated error, we construct a synthetic stellar mass function with 
correlated errors that mimic the systematic uncertainties in real observation. We choose 
a truncated series of Chebyshev polynomials to represent the observed stellar mass func- 
tion. The low-order coefficients specify the overall shape of the function, while the higher 
orders characterise small scale features. We find that Chebyshev polynomials up to order 4 
nicely fits the logarithmic stellar mass function, log $ (log m*); the best-fit coefficients are 
[-4.17,-1.26,-0.516,-0.274,-0.114]. We choose the standard deviations of these coeffi- 
cients [0.05,0.10,0.12,0.08,0.03] to represent the correlated uncertainties in the measure- 
ments. Then, we calculate the covariance matrix of this synthetic data using 1000 Monte 
Carlo realisations of the Chebyshev polynomials and use the synthetic data and the derived 
covariance matrix to constrain the parameters in our SAM. 

Figure M compares the marginalised posteriori for four pairs of model parameters obtained 
with the full covariance matrix (upper panels) and those obtained with the diagonal terms 
only (lower panels). Removing the off-diagonal terms is equivalent to ignoring the covariance. 
Clearly, the contours produced with the full covariance matrix are more compact. This is 
expected because the correlation of the errors between the different bins implies that the 
total independent error in the data is smaller. There are also noticeable changes in the shape 
and the orientation of the posterior distribution, indicating that it is important to model 




(17) 
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the covariance of the data properly to make correct model inferences. For example, in the 
/3sF-log«sF pl ane a new m ode appears with log«sF ~ 0.5. 

7 SUMMARY AND DISCUSSION 

Many of the physical processes parametrised in semi-analytical models of galaxy formation 
remain poorly understood and under specified. This has two critically important conse- 
quences for inferring constraints on the physical parameters: 1) prior assumptions about 
the size of the domain and the shape of the parameter distribution will strongly affect any 
resulting inference; and 2) a very large parameter space must be fully explored to obtain 
an accurate inference. Moreover, both must be done together. Both of these issues are nat- 
urally tackled with a Bayesian approach that allows one to constrain theory with data in 
a probabilistically rigorous way. In this paper, we have presented a semi-analytic model of 
galaxy formation in the framework of Bayesian inference and illustrated its performance 
on a test problem. Our sixteen-parameter semi-analytic model incorporates the most com- 
monly used parameterisations of important physical processes from existing SAMs including 
star formation, SN feedback, galaxy merger, and AGN feedback. We combined this model 
with the Bayesian Inference Engine developed at the University of Massachusetts. The BIE 
is an extensible MPI-based software package for developing, testing and running advanced 
Markov-Chain Monte-Carlo algorithms on large data sets. The resulting tool allows the 
exploration of the posterior distribution of a large number of model parameters, and to 
constrain models over multiple data sets in a statistically rigorous way. 

To demonstrate the power of this approach, we used the observationally derived stellar 
mass function of galaxies to constrain a number of important model parameters and to study 
the posterior probability distribution of the model parameters. The posterior probability 
is the conditional probability obtained after the data is taken into account. This is an 
important quantity for model inference because it specifies, for a given model family, the 
probability distribution of the parameters in the full parameter space. The Bayesian model 
inference requires one to examine the full posterior probability distribution, emphasising 
the marginalised probability instead of the 'best' parameters. We find that the posterior 
distribution has a very complex structure and topology, indicating that finding the best fit by 
tweaking model parameters is improbable. Moreover, the posterior clearly shows that many 
model parameters are strongly covariant and, therefore, the inferred value of a particular 
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parameter can be significantly affected by the priors used for the other parameters. As a 
consequence, one needs to have the knowledge about the posterior distribution in the entire 
parameter space when making a model inference, because it is very likely to miss significant 
modes if some parameters are fixed. We have demonstrated here that restricting the prior of 
some parameters without physical reasons can exclude models that should be considered as 



"P 
by 



ausible" , and hence lead to biased inferences. We note that the "implausibility" proposed 



Bower et al. 



( 120 101 ) is a measure of the adequacy of a fit defined for individual parameter 
sets. This is different from the posterior distribution discussed here. Finally, with the use of 
synthetic data to mimic systematic uncertainties in the reduced data, we have shown that 
model parameter inferences can be significantly affected by the use of an incorrect error 
model. This clearly demonstrates that an accurate error model (both sampling error and 
systematic uncertainties) is crucial to using observational data correctly, and conversely, a 
data-model comparison without an accurate error model is likely to be erroneous. 

The method developed here can be straightforwardly applied to other data sets and to 
multiple data sets simultaneously. Large galaxy surveys available now and in the near future 
will provide many more data sets to characterise the properties of the galaxy population 
not only at the present time but also at high redshifts. The Bayesian-inference based SAM 
described in this paper provides a framework for parameter estimation (e.g constraining the 
parameters in theoretical models given particular data sets), for hypothesis testing (e.g. com- 
paring the support for particular models given the data), and for predicting the power of new 
observations to constraining models of interest. In addition, the Bayesian approach explicitly 
builds on previous results by incorporating the constraints from previous inferences into new 
data sets; the Bayesian Inference Engine is designed to do this automatically. The approach 
developed here will produce probabilistically rigorous constraints on theoretical models, and 
facilitate understanding the underlying physical processes that shape the observed galaxy 
population. For many processes in galaxy formation, competing models have been proposed 
but not quantitatively compared. The marginalised likelihood or Bayes Evidence, which can 
be directly derived from the p osterior, an d explicit model comparison techniques, such as 



the reversible jump algorithm ( lGreenlll995l ). can provide a quantitative comparison between 
different models for given data. The Bayesian hypothesis test can, therefore, be used to dis- 
criminate between models. Finally, the prediction for an observable including the inferential 
uncertainties can be obtained by marginalising over the posterior. Such predictions can be 
used to assess the power of new observations. In a series of forthcoming papers, we will use 
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the scheme developed here to make inferences from various data sets, focusing on a number 
of the aspects discussed above. 

It should be pointed out, however, that the method developed here has its limitations, and 
improvements in the methodology are still needed to realise its full potential. For example, 
since the posterior is explored using MCMC sampling, one expects the computation to 
be more challenging as the dimensionality of the problem increases. Higher dimensionality 
means a larger parameter volume to be explored. If the new dimensions do not have a strong 
impact on the model predictions, the posterior will be extended in the new dimensions and 
require longer chains and/or more parallel chains to sample. If, on the other hand, the 
posterior is very compact in the new dimensions, the chance for a MCMC chain to find a 
mode will decrease. In practise, we expect the situation to be between these two extremes, 
but both cases make the computation more challenging. In addition, as the complexity of 
the model increases, one expects the topology of the posterior distribution to become more 
complicated, making it harder for the MCMC chains to mix well. We note that these general 
difficulties challenge any method of parameter space exploration, so one has to try different 
algorithms to determine the one that is the best suited for the problem in question. For the 
Tempered Differential Evolution MCMC algorithm adopted here, we may have to increase 
the maximum temperature for tempering to enhance mixing and/or increase the number of 
chains. Both of these will increase the number of SAM evaluations, each taking of order one 
minute to compute. One way to increase the speed of t he likelihood evalua tion would be 



to adopt a model emulator, such as the one proposed by iBower et al.l ( 120101 ). However, we 
note that significant computation is required to train such an emulator, and that the error 
introduced by the emulator has to be carefully controlled. 

The adding of new data to constrain the model will alter the posterior distribution. 
For example, the additional data may reduce the credible volume and/or produce more 
complex features in the posterior distribution. Except for increasing the computation time 
and/or impr oving the samplin g efficiency of the algorithm as described above, the Bayesian 



update (e.g. IWeinberg Il2010bl ) provides a natural solution to the problem. The data-based 
hierarchical approach, eliciting prior information from the posterior distribution constrained 
by less data, allows us to update the posterior as more and more data are included into the 
inference. This approach may reduce the computation time significantly. On the other hand, 
however, it is also possible that the additional data is not mutually compatible with the old 
data and the main modes would change significantly. Finally, the compatibility of the model 
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with the two data sets, e.g. how well the model accommodates the data, should be assessed 
with a goodness-of-fit test. 



Although the Bayesian approach is conceptually attractive, these computations are very 



costly. One way to increase the speed o 
emulator, such as the one proposed by 



the likelihood eval uation would be to adopt a model 
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( 20101 ). However, we note that the error 



introduced by the emulator has to be carefully controlled. For example, Figure |5] illustrates 
the complexity of the posterior distribution in parameter space. It remains to be verified that 
such a correlation can be accurately represented by an emulator. If the posterior distribution 
from the emulator approach could be used in some way to provide a prior distribution 
for the direc t simulation, the tw o methods together may prove to be more powerful than 



either alone. 
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( 120101 ) proposed an implausibility measure to characterise the fit 
of the predictions to the data. Like the prior distribution, the implausibility reduces the 
parameter space to be explored before new data is included. This refining inference can also 
be accomplished by a Bayesian update, which uses the posterior obtained from part of the 
data as the prior for the next level of the inference that uses the remaining data. 



As mentioned earlier, any model can only be considered as an approximation to reality, 
because of model uncertainties including stochasticity of the involved processes and inac- 
curacy of the model prescriptions. These uncertainties may be included in the likelihood 
if an ass umption is made ab out their statistical properties. The 'model discrepancy' intro- 



duced in 



Bower et al. 



(l201ol ) is an attempt along these lines. A large amount of the model 
uncertainties arise in SAMs because some processes are not well understood and can be 
modelled in various plausible ways. If a specific set of prescriptions is adopted by the model 
without taking into account their uncertainties, any inferences made are only for the model 
family specified by the adopted prescriptions. To include the model uncertainties, one can 
generalise the prescriptions to encompass a large number of possibilities. Alternatively, one 
can consider a number of plausible model families and use Bayes Evidence to discriminate 
between the different model families. In a forthcoming paper, we will demonstrate how the 
marginalised likelihood or Bayes Evidence can provide a quantitative comparison between 
different models for given data. 
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Figure 1. Flow chart describing the calculation of our semi-analytic model of galaxy formation. The parameters explored in 
the present paper are listed in the corresponding blocks. 
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Figure 2. A flow chart describing the structure of our Bayesian-inference based semi-analytic model. 
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Figure 3. The trajectories of four MCMC chains run with the Tempered Differential Evolution algorithm in the dimension of 
the parameter Vsf- The three solid lines (black, red and blue) are three randomly selected converged chains, while the dotted 
line (green) is an outlier chain as identified by the Gelman-Rubin statistic. 
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Figure 4. The marginalised posterior distribution for key parameters for our fiducial run (Case 0). The colour coding represents 
confidence levels as shown by the colour-bar on the top of the figure. The horizontal bars in the one-dimensional marginals 
indic ate the 95% confidence interval. The observed stellar mass function of galaxies (black line and error bars) from jBell et aLl 
I2003T) together with the marginalised model prediction is inset. The red solid line shows the median value of the prediction, 
while the yellow shaded region represents the 95% confidence interval. 
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Figure 5. The likelihood function in a three-dimensional space defined by logojsp, logogN and log/gp. Other parameters 
are fixed at the values where the posterior peaks. The inner surface (blue) has log(L) = —4, and the outer surface (red) has 
log(L) = —9.9 (so the latter value is approximately "l-sigma" if the likelihood function were Gaussian). The contour lines on 
the planes are linearly spaced in the logarithmic scale with a spacing of 13.4. This figure demonstrates the complexity of the 
likelihood function and the posterior distribution. 
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Figure 6. The marginalised posterior distribution for key parameters in Case l.Very restrictive priors are assumed for the 
parameters, /3gp, Vgp and /sf, whose central values are indicated by magenta lines. 
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Figure 7. The marginalised posterior distribution of key parameters for Case 2. This includes the restrictions of Case 1 as in 
Fig. [6] with an additional restrictive prior for /3rh- 
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Figure 8. A comparison of the posterior distribution obtained for a likelihood function including covariance (upper row, 
I17> . and using only the diagonal terms of the covariance matrix (lower row). 
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